The data I will be using consists of beers and their statistics. The data was acquired on 2017/03/03 from an excellent beer rating site called Ratebeer using the scraper I wrote. I do not claim to own the data, and neither will I share it here. Anyone interested in it can scrape it themselves using the supplied scraper. The data will be used for, and only for, educational purposes.
I’ve always wondered what makes a good beer. Is it the style? The strength? The country it’s coming from? Or is it just the taste of it? One could probably guess it’s mainly the taste of it, but this course and its final project gives an excellent chance to determine if there is correlation between good ratings, beer styles and such.
beers <- read.csv("data/wrangled_beers.csv", header=T, quote="\"", row.names = 1)
summary(beers)
## retired brewery country
## Mode :logical BrewDog : 344 Belgium :1637
## FALSE:2660 Mikkeller : 197 Czech Republic: 51
## TRUE :666 De Struise Brouwers: 137 Estonia : 69
## NA's :0 De Proefbrouwerij : 131 Finland : 634
## Brouwerij Alvinne : 129 Scotland : 935
## Stadin Panimo : 125
## (Other) :2263
## style special score score_style
## Belgian Ale : 280 Autumn : 32 high :798 high :814
## India Pale Ale (IPA): 241 Series : 128 low :832 low :861
## Belgian Strong Ale : 229 Special: 885 med_high:861 med_high:834
## Imperial Stout : 186 Spring : 26 med_low :835 med_low :817
## American Pale Ale : 135 Summer : 44
## Abbey Tripel : 129 Winter : 114
## (Other) :2126 no :2097
## abv calories ratings weighted_avg_score
## high :707 high :707 high :829 high :807
## low :847 low :847 low :865 low :857
## med_high:952 med_high:952 med_high:805 med_high:838
## med_low :820 med_low :820 med_low :827 med_low :824
##
##
##
The interesting variables consist of beer style, different scores, the amount of ratings, calories, alcohol content and whether the beer is seasonal or not.
The beers were selected by selecting breweries from some countries which had at least 80 beers in their selection. That makes smaller and mid sized breweries not to be present in the data. There had to be some line, or else there would’ve been a gigantic volume of data.
The numeric variables were standardized and categorized for further use.
You can check out the data wrangling script from here and the scraper from here.
For the analysis we’re going to drop breweries and beer styles, since they make the plot unreadable and weighted average scores since they’re pretty much the same as standardized scores.
custom_beers <- dplyr::select(beers, -c(brewery, weighted_avg_score, style))
p <- gather(custom_beers) %>% ggplot(aes(value)) + facet_wrap("key", scales = "free") + geom_bar(fill = "#dd4814") + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8))
ggplotly(p)
# multiple correspondence analysis
mca <- MCA(custom_beers, graph = FALSE)
# summary of the model
summary(mca)
##
## Call:
## MCA(X = custom_beers, graph = FALSE)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## Variance 0.365 0.273 0.253 0.217 0.166 0.155
## % of var. 11.224 8.390 7.798 6.668 5.093 4.784
## Cumulative % of var. 11.224 19.614 27.412 34.081 39.174 43.957
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## Variance 0.142 0.132 0.130 0.127 0.126 0.123
## % of var. 4.382 4.075 4.000 3.902 3.862 3.800
## Cumulative % of var. 48.340 52.414 56.414 60.316 64.178 67.978
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## Variance 0.122 0.120 0.119 0.114 0.108 0.100
## % of var. 3.742 3.679 3.665 3.503 3.335 3.092
## Cumulative % of var. 71.720 75.400 79.064 82.567 85.902 88.994
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## Variance 0.093 0.089 0.078 0.065 0.033 0.000
## % of var. 2.853 2.731 2.385 2.011 1.025 0.000
## Cumulative % of var. 91.848 94.578 96.963 98.975 100.000 100.000
## Dim.25 Dim.26
## Variance 0.000 0.000
## % of var. 0.000 0.000
## Cumulative % of var. 100.000 100.000
##
## Individuals (the 10 first)
##
## Plevna / Beer Hunters / Mallaskoski Mount Evans |
## Plevna / Bryggeri Helsinki Alt |
## Plevna / Mallaskoski Sugu Porter |
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti |
## Plevnan 67 Minutes IPA |
## Plevnan Amarillo Weizen |
## Plevnan Barley Wine (Sherry Aged) |
## Plevnan Barley Wine 2012 |
## Plevnan Biopukki |
## Plevnan Black Ale |
## Dim.1
## Plevna / Beer Hunters / Mallaskoski Mount Evans 0.723
## Plevna / Bryggeri Helsinki Alt 0.599
## Plevna / Mallaskoski Sugu Porter 0.401
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.474
## Plevnan 67 Minutes IPA 0.723
## Plevnan Amarillo Weizen 0.159
## Plevnan Barley Wine (Sherry Aged) -0.791
## Plevnan Barley Wine 2012 -0.379
## Plevnan Biopukki 0.474
## Plevnan Black Ale 0.490
## ctr
## Plevna / Beer Hunters / Mallaskoski Mount Evans 0.043
## Plevna / Bryggeri Helsinki Alt 0.030
## Plevna / Mallaskoski Sugu Porter 0.013
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.019
## Plevnan 67 Minutes IPA 0.043
## Plevnan Amarillo Weizen 0.002
## Plevnan Barley Wine (Sherry Aged) 0.052
## Plevnan Barley Wine 2012 0.012
## Plevnan Biopukki 0.019
## Plevnan Black Ale 0.020
## cos2
## Plevna / Beer Hunters / Mallaskoski Mount Evans 0.208
## Plevna / Bryggeri Helsinki Alt 0.110
## Plevna / Mallaskoski Sugu Porter 0.058
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.069
## Plevnan 67 Minutes IPA 0.208
## Plevnan Amarillo Weizen 0.009
## Plevnan Barley Wine (Sherry Aged) 0.181
## Plevnan Barley Wine 2012 0.042
## Plevnan Biopukki 0.075
## Plevnan Black Ale 0.081
##
## Plevna / Beer Hunters / Mallaskoski Mount Evans |
## Plevna / Bryggeri Helsinki Alt |
## Plevna / Mallaskoski Sugu Porter |
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti |
## Plevnan 67 Minutes IPA |
## Plevnan Amarillo Weizen |
## Plevnan Barley Wine (Sherry Aged) |
## Plevnan Barley Wine 2012 |
## Plevnan Biopukki |
## Plevnan Black Ale |
## Dim.2
## Plevna / Beer Hunters / Mallaskoski Mount Evans -0.393
## Plevna / Bryggeri Helsinki Alt -0.145
## Plevna / Mallaskoski Sugu Porter -0.308
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.570
## Plevnan 67 Minutes IPA -0.393
## Plevnan Amarillo Weizen -0.216
## Plevnan Barley Wine (Sherry Aged) 0.566
## Plevnan Barley Wine 2012 0.412
## Plevnan Biopukki -0.255
## Plevnan Black Ale -0.315
## ctr
## Plevna / Beer Hunters / Mallaskoski Mount Evans 0.017
## Plevna / Bryggeri Helsinki Alt 0.002
## Plevna / Mallaskoski Sugu Porter 0.010
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.036
## Plevnan 67 Minutes IPA 0.017
## Plevnan Amarillo Weizen 0.005
## Plevnan Barley Wine (Sherry Aged) 0.035
## Plevnan Barley Wine 2012 0.019
## Plevnan Biopukki 0.007
## Plevnan Black Ale 0.011
## cos2
## Plevna / Beer Hunters / Mallaskoski Mount Evans 0.061
## Plevna / Bryggeri Helsinki Alt 0.006
## Plevna / Mallaskoski Sugu Porter 0.034
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.100
## Plevnan 67 Minutes IPA 0.061
## Plevnan Amarillo Weizen 0.017
## Plevnan Barley Wine (Sherry Aged) 0.092
## Plevnan Barley Wine 2012 0.050
## Plevnan Biopukki 0.022
## Plevnan Black Ale 0.033
##
## Plevna / Beer Hunters / Mallaskoski Mount Evans |
## Plevna / Bryggeri Helsinki Alt |
## Plevna / Mallaskoski Sugu Porter |
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti |
## Plevnan 67 Minutes IPA |
## Plevnan Amarillo Weizen |
## Plevnan Barley Wine (Sherry Aged) |
## Plevnan Barley Wine 2012 |
## Plevnan Biopukki |
## Plevnan Black Ale |
## Dim.3
## Plevna / Beer Hunters / Mallaskoski Mount Evans -0.706
## Plevna / Bryggeri Helsinki Alt -0.727
## Plevna / Mallaskoski Sugu Porter -0.708
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.394
## Plevnan 67 Minutes IPA -0.706
## Plevnan Amarillo Weizen -0.665
## Plevnan Barley Wine (Sherry Aged) -0.240
## Plevnan Barley Wine 2012 -0.219
## Plevnan Biopukki -0.618
## Plevnan Black Ale -0.654
## ctr
## Plevna / Beer Hunters / Mallaskoski Mount Evans 0.059
## Plevna / Bryggeri Helsinki Alt 0.063
## Plevna / Mallaskoski Sugu Porter 0.060
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.018
## Plevnan 67 Minutes IPA 0.059
## Plevnan Amarillo Weizen 0.052
## Plevnan Barley Wine (Sherry Aged) 0.007
## Plevnan Barley Wine 2012 0.006
## Plevnan Biopukki 0.045
## Plevnan Black Ale 0.051
## cos2
## Plevna / Beer Hunters / Mallaskoski Mount Evans 0.199
## Plevna / Bryggeri Helsinki Alt 0.163
## Plevna / Mallaskoski Sugu Porter 0.180
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti 0.048
## Plevnan 67 Minutes IPA 0.199
## Plevnan Amarillo Weizen 0.158
## Plevnan Barley Wine (Sherry Aged) 0.017
## Plevnan Barley Wine 2012 0.014
## Plevnan Biopukki 0.128
## Plevnan Black Ale 0.144
##
## Plevna / Beer Hunters / Mallaskoski Mount Evans |
## Plevna / Bryggeri Helsinki Alt |
## Plevna / Mallaskoski Sugu Porter |
## Plevna / N\303\270gne \303\230 Pohjant\303\244hti |
## Plevnan 67 Minutes IPA |
## Plevnan Amarillo Weizen |
## Plevnan Barley Wine (Sherry Aged) |
## Plevnan Barley Wine 2012 |
## Plevnan Biopukki |
## Plevnan Black Ale |
##
## Categories (the 10 first)
##
## FALSE |
## TRUE |
## Belgium |
## Czech Republic |
## Estonia |
## Finland |
## Scotland |
## Autumn |
## Series |
## Special |
## Dim.1
## FALSE 0.049
## TRUE -0.195
## Belgium -0.348
## Czech Republic 0.392
## Estonia 1.345
## Finland 0.867
## Scotland -0.099
## Autumn 0.227
## Series -0.706
## Special -0.423
## ctr
## FALSE 0.065
## TRUE 0.261
## Belgium 2.045
## Czech Republic 0.081
## Estonia 1.286
## Finland 4.910
## Scotland 0.094
## Autumn 0.017
## Series 0.657
## Special 1.629
## cos2
## FALSE 0.010
## TRUE 0.010
## Belgium 0.118
## Czech Republic 0.002
## Estonia 0.038
## Finland 0.177
## Scotland 0.004
## Autumn 0.000
## Series 0.020
## Special 0.065
## v.test
## FALSE 5.627
## TRUE -5.627
## Belgium -19.766
## Czech Republic 2.824
## Estonia 11.290
## Finland 24.262
## Scotland -3.568
## Autumn 1.288
## Series -8.145
## Special -14.677
##
## FALSE |
## TRUE |
## Belgium |
## Czech Republic |
## Estonia |
## Finland |
## Scotland |
## Autumn |
## Series |
## Special |
## Dim.2
## FALSE -0.113
## TRUE 0.452
## Belgium -0.383
## Czech Republic -1.118
## Estonia 0.119
## Finland 0.046
## Scotland 0.692
## Autumn 0.244
## Series 0.369
## Special 0.227
## ctr
## FALSE 0.469
## TRUE 1.872
## Belgium 3.310
## Czech Republic 0.879
## Estonia 0.014
## Finland 0.018
## Scotland 6.169
## Autumn 0.026
## Series 0.240
## Special 0.629
## cos2
## FALSE 0.051
## TRUE 0.051
## Belgium 0.142
## Czech Republic 0.019
## Estonia 0.000
## Finland 0.000
## Scotland 0.187
## Autumn 0.001
## Series 0.005
## Special 0.019
## v.test
## FALSE -13.030
## TRUE 13.030
## Belgium -21.744
## Czech Republic -8.048
## Estonia 1.003
## Finland 1.277
## Scotland 24.947
## Autumn 1.385
## Series 4.256
## Special 7.883
##
## FALSE |
## TRUE |
## Belgium |
## Czech Republic |
## Estonia |
## Finland |
## Scotland |
## Autumn |
## Series |
## Special |
## Dim.3
## FALSE -0.038
## TRUE 0.154
## Belgium -0.063
## Czech Republic -0.084
## Estonia -0.546
## Finland 0.021
## Scotland 0.140
## Autumn -0.216
## Series 0.538
## Special -0.036
## ctr
## FALSE 0.058
## TRUE 0.233
## Belgium 0.095
## Czech Republic 0.005
## Estonia 0.305
## Finland 0.004
## Scotland 0.273
## Autumn 0.022
## Series 0.549
## Special 0.017
## cos2
## FALSE 0.006
## TRUE 0.006
## Belgium 0.004
## Czech Republic 0.000
## Estonia 0.006
## Finland 0.000
## Scotland 0.008
## Autumn 0.000
## Series 0.012
## Special 0.000
## v.test
## FALSE -4.431
## TRUE 4.431
## Belgium -3.554
## Czech Republic -0.604
## Estonia -4.580
## Finland 0.586
## Scotland 5.057
## Autumn -1.225
## Series 6.206
## Special -1.250
##
## FALSE |
## TRUE |
## Belgium |
## Czech Republic |
## Estonia |
## Finland |
## Scotland |
## Autumn |
## Series |
## Special |
##
## Categorical variables (eta2)
##
## retired |
## country |
## special |
## score |
## score_style |
## abv |
## calories |
## ratings |
## Dim.1
## retired 0.010
## country 0.246
## special 0.105
## score 0.651
## score_style 0.249
## abv 0.697
## calories 0.697
## ratings 0.263
## Dim.2
## retired 0.051
## country 0.227
## special 0.056
## score 0.065
## score_style 0.055
## abv 0.853
## calories 0.853
## ratings 0.020
## Dim.3
## retired 0.006
## country 0.014
## special 0.023
## score 0.026
## score_style 0.005
## abv 0.976
## calories 0.976
## ratings 0.001
##
## retired |
## country |
## special |
## score |
## score_style |
## abv |
## calories |
## ratings |
cats = apply(custom_beers, 2, function(x) nlevels(as.factor(x)))
# data frames for ggplot
mca1_vars_df = data.frame(mca$var$coord, Variable = rep(names(cats),
cats))
mca1_obs_df = data.frame(mca$ind$coord)
ggplot(data = mca1_obs_df, aes(x = Dim.1, y = Dim.2)) + geom_hline(yintercept = 0,
colour = "gray70") + geom_vline(xintercept = 0, colour = "gray70") + geom_point(colour = "gray50",
alpha = 0.7) + geom_density2d(colour = "gray80") + geom_text(data = mca1_vars_df,
aes(x = Dim.1, y = Dim.2, label = rownames(mca1_vars_df), colour = Variable)) +
ggtitle("MCA") + scale_colour_discrete(name = "Variable")
The goal was to find out what contributes to a high score. As you can (hopefully) see from the plot, the surroundings of score_high is pretty deserted at the top left part. High rating count seems to be the nearest point to it.
Series beers seem to get high style scores.
Alcohol content and calories are exactly on top of each other, but that shouldn’t be that surprising.
Finland seems to be pretty close to low scores.
The dots in the plot are the individual beers. The dots don’t seem to align that much with the variables, and there’s a huge clump in the top right corner with no variables in sight.
I was also interested in how only beer styles compare to points:
The percentages are pretty low, but the plot is still fun to look at. Looks like the styles near the high score point are the more unconventional beer styles.
var <- get_mca_var(mca)
fviz_contrib(mca, choice = "var", axes = 1)
fviz_contrib(mca, choice = "var", axes = 2)
The dashed red line is the expected average contribution to the dimension. Any variable over it is considered important.
Seems like both the dimensions consist mainly of alcohol content and calories.
fviz_screeplot(mca)
Two dimension seems to be the best in this case, and that’s what we had.
It looks like the question of good beer still remains unanswered. The plots didn’t reveal much, except the beer style plot had something of value.
I think there is a bit too much observations in my data, which creates so much variation that MCA can’t really work with it. There could’ve been better results with a smaller data set, but that could’ve simplified the results too. It could be interesting to see how having more countries would affect the results, for example having some American or Australian beers as well. Price could be an interesting variable too.
One explanation could be that it’s all just a matter of taste, and has nothing to do with the data available here.